Conversation
|
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10385 +/- ##
===========================================
- Coverage 49.08% 48.95% -0.13%
===========================================
Files 763 767 +4
Lines 125673 126153 +480
===========================================
+ Hits 61689 61764 +75
- Misses 63984 64389 +405 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| self.disable_lora = False | ||
| if mp_moe or is_distributed: | ||
| for p in self.parameters(): | ||
| p.is_distributed = is_distributed |
There was a problem hiding this comment.
用于EP,is_distributed标识训练开始的时候不要同步参数和mp_moe用于uc
| level=self.args.fp16_opt_level, | ||
| dtype=self.amp_dtype, | ||
| excluded_layers=[QuantizationLinear] + self._decorate_exclude_layers(model), | ||
| excluded_layers=[QuantizationLinear, ColumnParallelQuantizationLinear, RowParallelQuantizationLinear] |
There was a problem hiding this comment.
防止精度为fp32的量化scale被cast成bf16
|
|
||
| # Optimize for skip unused shard files for supper large model | ||
| if sharded_metadata is not None and quantization_linear_list is None: | ||
| if sharded_metadata is not None: |
There was a problem hiding this comment.
skip掉不需要读取的参数分片,加速加载
| @@ -1,4 +1,4 @@ | |||
| # Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. | |||
| # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | |||
| new_weight += self.lora_A @ self.lora_B * self.scaling | ||
| self.quantize_weight(new_weight) | ||
| self.merged = True | ||
| mp_moe = getattr(self.quant_weight, "mp_moe", False) |
There was a problem hiding this comment.
这个是unified checkpoint需要使用
| @@ -0,0 +1,154 @@ | |||
| # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | |||
| # | |||
There was a problem hiding this comment.
slim同学给的,现在去掉不需要使用的部分
| from .hadamard_utils import random_hadamard_matrix | ||
|
|
||
|
|
||
| def quantize_tensorwise(x, quantization_config=None, bit_length=8, state=0, training=False, act_scale=None): |
There was a problem hiding this comment.
这里需要单独对QAT用到的量化方法单独一个文件吗?是不是所有的量化方法都在一起比较好
There was a problem hiding this comment.
qat方法比较复杂,后续还会加比较多东西,所以单独写一个qat_utils
paddlenlp/quantization/qat_utils.py
Outdated
| if quantization_config.apply_hadamard: | ||
| target_x = x @ infohub.hadamard[x.shape[-1]][0] | ||
| else: | ||
| target_x = x.clone() |
| input_grad = None | ||
|
|
||
| if not quant_weight.stop_gradient: | ||
| weight_grad = paddle.einsum("bsh,bsd->hd", x, grad_output) |
There was a problem hiding this comment.
paddle的einsum在某些场景下有坑,看看是否适合用enisum
There was a problem hiding this comment.
果然有问题!einsum比matmul慢好多,我换成matmul了
PR types
New features
PR changes
APIs
Description
loralinear